{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Batch processing of files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the Python standard libraries (i.e., the `glob` and `os` modules), we can also quickly code up batch operations e.g. over all files with a certain extension in a directory. For example, we can make a list of all `.wav` files in the `audio` directory, use Praat to pre-emphasize these [Sound](../api/parselmouth.Sound.rst#parselmouth.Sound) objects, and then write the pre-emphasized sound to a `WAV` and `AIFF` format file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find all .wav files in a directory, pre-emphasize and save as new .wav and .aiff file\n",
    "import parselmouth\n",
    "\n",
    "import glob\n",
    "import os.path\n",
    "\n",
    "for wave_file in glob.glob(\"audio/*.wav\"):\n",
    "    print(\"Processing {}...\".format(wave_file))\n",
    "    s = parselmouth.Sound(wave_file)\n",
    "    s.pre_emphasize()\n",
    "    s.save(os.path.splitext(wave_file)[0] + \"_pre.wav\", 'WAV') # or parselmouth.SoundFileFormat.WAV instead of 'WAV'\n",
    "    s.save(os.path.splitext(wave_file)[0] + \"_pre.aiff\", 'AIFF')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After running this, the original home directory now contains all of the original ``.wav`` files pre-emphazised and written again as ``.wav`` and ``.aiff`` files. The reading, pre-emphasis, and writing are all done by Praat, while looping over all ``.wav`` files is done by standard Python code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List the current contents of the audio/ folder\n",
    "!ls audio/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Remove the generated audio files again, to clean up the output from this example\n",
    "!rm audio/*_pre.wav\n",
    "!rm audio/*_pre.aiff"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can use the [pandas](https://pandas.pydata.org/) library to read a CSV file with data collected in an experiment, and loop over that data to e.g. extract the mean harmonics-to-noise ratio. The `results` CSV has the following structure:\n",
    "\n",
    "condition | ... | pp_id\n",
    "--------- | --- | -----\n",
    "0         | ... | 1877\n",
    "1         | ... | 801\n",
    "1         | ... | 2456\n",
    "0         | ... | 3126"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code would read such a table, loop over it, use Praat through Parselmouth to calculate the analysis of each row, and then write an augmented CSV file to disk. To illustrate we use an example set of sound fragments: [results.csv](other/results.csv), [1_b.wav](audio/1_b.wav), [2_b.wav](audio/2_b.wav), [3_b.wav](audio/3_b.wav), [4_b.wav](audio/4_b.wav), [5_b.wav](audio/5_b.wav), [1_y.wav](audio/1_y.wav), [2_y.wav](audio/2_y.wav), [3_y.wav](audio/3_y.wav), [4_y.wav](audio/4_y.wav), [5_y.wav](audio/5_y.wav)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In our example, the original CSV file, [results.csv](other/results.csv) contains the following table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "print(pd.read_csv(\"other/results.csv\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def analyse_sound(row):\n",
    "    condition, pp_id = row['condition'], row['pp_id']\n",
    "    filepath = \"audio/{}_{}.wav\".format(condition, pp_id)\n",
    "    sound = parselmouth.Sound(filepath)\n",
    "    harmonicity = sound.to_harmonicity()\n",
    "    return harmonicity.values[harmonicity.values != -200].mean()\n",
    "\n",
    "# Read in the experimental results file\n",
    "dataframe = pd.read_csv(\"other/results.csv\")\n",
    "\n",
    "# Apply parselmouth wrapper function row-wise\n",
    "dataframe['harmonics_to_noise'] = dataframe.apply(analyse_sound, axis='columns')\n",
    "\n",
    "# Write out the updated dataframe\n",
    "dataframe.to_csv(\"processed_results.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now have a look at the results by reading in the `processed_results.csv` file again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(pd.read_csv(\"processed_results.csv\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Clean up, remove the CSV file generated by this example\n",
    "!rm processed_results.csv"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
